Chapter 15 A 
Epistemic Gains and Epistemic Games: cerar | 
Reliability and Higher Order Evidence in 
Medicine and Pharmacology 


Barbara Osimani 


Abstract In this paper I analyse the dissent around evidence standards in medicine 
and pharmacology as a result of distinct ways to address epistemic losses in 
our game with nature and the scientific ecosystem: an “elitist” and a “pluralist” 
approach. The former is focused on reliability as minimisation of random and 
systematic error, and is grounded on a categorical approach to causal assessment, 
whereas the latter is more focused on the high context-sensitivity of causation in 
medicine and in the soft sciences in general, and favours probabilistic approaches 
to scientific inference, as better equipped for defeasibility of causal inference 
in such domains. I then present a system for probabilistic causal assessment 
from heterogenous evidence that makes justice of concerns from both positions, 
while also incorporating “higher order evidence” (evidence/information about the 
evidence itself) in hypothesis confirmation. 


15.1 Introduction 


Medical science distinguishes itself with respect to other sciences and technologies 
by the joint interaction of following phenomena: 


1. Epistemic uncertainty with respect to the real state of nature and of the outcome 
of interventions is generally higher than for other natural sciences (Joffe 2011), 
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indeed among the natural sciences, medicine is the closest one to the social 
sciences and the humanities; 

2. Medical products are so called “credence products”, that is products for which 
the consumer (medical community, patients, and public health system in a 
more general sense) cannot evaluate the quality prior (and often not even after) 
consumption. Such state of affairs is therefore characterized by information 
asymmetry, which further interacts with the decision-makers’ risk attitude. 

3. medical choices are high-stakes decisions both in terms of physical, psychologi- 
cal, existential, and financial costs (Osimani 2012), 

4. information asymmetry and high sensitivity of medical decisions, can be strate- 
gically exploited by producers of medical knowledge having vested interests in 
the research outputs and dissemination (such strategic behaviour may of course 
also evolve in time) (Teira 2011; Teira and Reiss 2013; Holman 2015); 

5. medical decisions are intertwined with pressing ethical dilemmas touching 
different anthropological dimensions (Papa 2014; Sgreccia 2007; Pessina 2009; 
Beauchamp 2011; Faden and Beauchamp 1986; Scheu 2003); 


This picture is especially vivid in pharmacology; the complex network of interests 
(financial issues, concern for reputation etc.), as well as legal rights and duties which 
frame the scientific and social ecosystem in which pharmacology is embedded make 
it a unique blend of science and technology. Indeed, it is manifest that in scientific 
domains characterised by vested interests, the production and evaluation of evidence 
is embedded in a strategic game, where agents obviously try to maximise their 
payoff. This state of affairs strongly emphasises the role of reliability (in its various 
aspects) as a decisive dimension of evidence. 

This paper focuses on the roots of methodological dissent in medicine as a dissent 
related to how reliability is conceptualised and warranted in contending schools of 
thoughts. 

The paper is organised as follows: in Sect. 15.2, I present the debate around 
standards of evidence in medicine and the role played by evidence from randomized 
controlled trials vs. evidence of biological mechanisms or other kinds of sources. 
Section 15.3 presents the two poles on which such debate rests: the “elitist game” 
and the “pluralist game”. These two distinct approaches focus on different aspects 
of reliability as a result of weighting the costs of different kinds of errors differently. 
In particular, elitists strive to maximise internal validity by minimising random 
and systematic error, while the pluralists are rather concerned by the high context- 
sensitivity of causes in the biological realm (and in the soft sciences in general), 
hence they are more concerned about the stability of causal knowledge across popu- 
lations and domains (failure of external validity and false predictions). In Sect. 15.4, 
I illustrate a meta-level perspective where the structure and organisation of the 
entire body of evidence brings its own contribution to hypothesis confirmation. In 
this perspective, epistemic dimensions of evidence itself, such as its reliability, con- 
sistency/coherence and (in)dependence also contribute to hypothesis confirmation 
in an interactive way, and these are connected with research organisation issues: 
how the scientific ecosystem is modelled and interacts with the broader social 
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system. This perspective is concretely translated in a multilayer framework which 
models probabilistic causal inference through evidence synthesis elaborated by 
De Pretis et al. (2019). This framework, henceforth “E-Synthesis”, reconciles var- 
ious concerns around causal inference in medicine and pharmacology, by allowing 
various kinds of evidence to jointly contribute to hypothesis confirmation, as well 
as to incorporate higher order evidence, such as evidence about the reliability of 
the sources and about its relevance with respect to the target population, hence to 
account both for elitists’ and pluralists’ concerns. 

From a philosophical point of view, “E-Synthesis” provides a solution that 
accommodates the intuitions underpinning apparently conflicting concerns,! and 
has the virtue of being able to be adapted to various theoretical stances on causality 
(counterfactual vs. process theories vs. regularity vs. inferentialist theories of 
causality) (see Poellinger 2018). Section 15.5 presents how “E-Synthesis” addresses 
and solves philosophical issues around causal inference in medicine; in particular, 
the privilege accorded to randomised controlled trials, the debate around evidence 
hierarchies, the epistemic status of evidence for biological mechanisms, and the role 
of higher order evidence (evidence about evidence itself) in hypothesis confirmation. 


15.2 Isolating Causes vs. Causes in Interaction: The Two 
Contending Paradigms 


The philosophical and methodological debate concerning evidence in medicine can 
be roughly made sense of as a sophisticated elaboration of arguments in favour of 
(or against) two main paradigms for scientific inference: a categorical approach — 
inherited from frequentist statistics — and a probabilistic approach (where hypothesis 
confirmation comes in degrees) — inherited from an inductive/Bayesian approach 
to scientific inference.* Nancy Cartwright (2007a) for instance speaks about 
“clinching” and “vouching” methods as distinctive ways in which causal inference 
may be carried out in medicine and in the social sciences: vouchers, other than 
clinchers, do not force any conclusion, but rather suggest or support it more or less 
strongly. 


— Clinchers. In principle, clinchers deductively force their conclusions and deliver 
an acceptance/rejection verdict on the hypothesis under investigation, on the 
basis of a “modus tollens” reasoning: 


'This is possible also because the topology of our framework reveals that these conflicts relate to 
different levels or dimensions of causal inference and therefore can be deflated. 

?These two approaches intersect with other epistemic stances such as empiricism vs. methodolog- 
ical pluralism, as well as various programs for causal inference from statistical data, however the 
above mentioned dichotomy can be analyzed relatively independently from these perspectives. 
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HD E,7E 
~H 


where H stays for hypothesis and E for evidence. However in experimental 
settings the syllogism is applied probabilistically, since the evidence ~ E consists 
in an event that is not impossible under the assumption of H holding, but rather 
only very improbable. That is why ~E only invites the following statement, 
also called “Fisher disjunction”: either something very improbable happened, 
but H is nevertheless true, or H is false. However, in practice hypotheses are 
either rejected or not, and because of this categorical approach, clinchers are 
characterised by a greater inductive risk, which explains why they are focused 
on the probability of erroneous conclusions (Mayo and Spanos 2006; Sprenger 
2016). More importantly, the threshold for hypothesis acceptance/rejection is 
based explicitly on the degree of reliability that one wishes to obtain: that is, 
on significance levels, which are substantially based on the size of random error 
considered to be tolerable (for the purpose at hand). In the standard approach 
of hypothesis testing, the hypothesis under investigation regards the individual 
contribution of a putative cause to its effect (possibly moderated by some 
prognostic factors) and such hypothesis is contrasted with the so called null 
hypothesis of the cause under investigation making no difference to the observed 
effect. For an observed positive effect, the hypothesis space consists of three 
general alternatives: (1) either the treatment causes the observed effect; (2) or the 
effect is due to chance; (3) or the effect is due to some alternative confounding 
factors. Reliability theory, and methodology in general, define the instruments 
that should be used in order to exclude such alternative hypotheses as safely as 
possible: Large sample sizes help to exclude chance, by the law of large numbers 
(hence, the larger the sample sizes, the lower the random error, ceteris paribus); 
study design helps to evaluate the internal validity of the results, i.e. the extent 
to which alternative causes can be excluded. Therefore, clinchers are studies that 
are reliable both in the sense that they exclude chance (random error), and in the 
sense that they exclude bias or confounding (systematic error) to a large extent. 
In sum, clinchers maximise accuracy and internal validity. 

— Vouchers are methods whose results cannot force any conclusion, but only 
suggest it. They allow only defeasible inference, but they are flexible enough 
to incorporate new possibly conflicting evidence in the inferential framework, 
without necessary eliminating old beliefs. This is a particularly appropriate 
approach when causes are highly context-sensitive (such in biology or the social 
sciences). Reliability refers here rather to whether a causal link established for 
some population may also hold in other contexts, and on how interacting causes 
modulate the causal effect. 


In medical research the two approaches are reflected in two different perspec- 
tives with respect to the evaluation and use of evidence coming from different 
sources/methods. 
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The so called “Evidence Based Medicine” paradigm has emphasized the selec- 
tion of “best evidence”, meant as accurate and internally valid evidence, and 
therefore has privileged some methods rather than others, on grounds of their higher 
reliability warrant. The contending view criticizes this paradigm for being uselessly 
and harmfully monistic and for leaning on misleading heuristics (such as evidence 
hierarchies).? 

In particular, critics of the EBM approach have emphasized its shortcoming in 
dealing with external validity issues (Cartwright 2012; Clarke et al. 2014), which 
is indeed nothing else than the straightforward consequence of context sensitivity 
of causation, and in not paying due attention to the ontology of the phenomena 
being investigated (Cartwright 2007d; Anjum 2012). Furthermore, EBM is blamed 
for unknowingly bringing back through the window what they threw back from 
the door, i.e. “subjectivity” (Stegenga 2011), and for not keeping their epistemic 
promises (Worrall 2007b), or for running against their intended goals by not 
adequately distinguishing between harm vs. benefit assessment (Osimani 2014), and 
not being cost-effective in using the available evidence (De Pretis et al. 2019; Russo 
and Williamson 2007).* 

The debate has been mainly carried on by way of illustrating case studies where 
one or the other approach fails to deliver the optimal information for making 
clinical decisions or policies. For instance, Worrall (2007b,c) illustrates the case of 
Extracorporeal Membrane Oxygenation for newborn babies affected by persistent 
pulmonary hypertension. Notwithstanding high success rate of the treatment and 
knowledge of the mechanisms explaining that success, efficacy was considered 
not to be established until proven by randomized controlled trials. This obviously 
exposed the newborns in the no-treatment arm to almost certain, and, given the 
available treatment, avoidable death. Since the newborns in the treatment arm had 
instead a much higher chance to survive, this was a dramatic case of lack of 
equipose. By pointing to the neglected sources of knowledge (mechanisms and 
past success record in uncontrolled case series) and by downsizing the reliability 
of RCTs, Worrall explains why “there is no cause to randomise” (see Worrall 2008). 

Cartwright exemplifies her theory of extrapolation, by reporting about an inte- 
grated nutrition program which, while fully successful in India, completely failed 
in Bangladesh, and imputes this failure to a mismatch between the causal structure 
and social mechanisms working in the original vs. the target population (Cartwright 
2012). 

Also Russo and Williamson (2007) and Clarke et al. (2013, 2014) provide a rich 
list of case studies to support their view that evidence of different kinds are co- 


3Also, a parallel issue concerns the strong preference accorded to the so called Potential 
Outcome Approach (POA) vs. more pluralistic views of causal inference and causality itself (see: 
Vandenbroucke et al. (2016)). 


4Not to mention the fact that, traditionally, epistemology has considered varied evidence as more 
confirmatory than repetitive data (see Osimani and Landes (2020) for a discussion of the “Variety 
of Evidence Thesis”.) 
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supportive and provide a stronger basis for causal assessment, by providing distinct 
types of information. 

Instead Howick and colleagues (2013) warn that knowledge of mechanims need 
always to be complemented by randomised controlled trials, while the reverse does 
not hold. The basis for this position is that biological pathways are extremely 
complicated and rich of feedback loops and backup mechanisms, which may lead 
to surprising outcomes at the phenotypic level. Although RCTs provide black-box 
evidence only, they are the only reliable basis for causal inference, in that they 
deliver clear information about input-output data at the level of interest, the clinical 
one.’ 

At its root such dissent is originated by the two sides playing different games 
with nature and within the scientific ecosystem. EBM plays an “elitist” game 
where evidence is put to test before entering the court: a certain threshold for 
reliability is established before the evidence is considered at all. Hence reliability 
is the gatekeeper and has precedence over other possible epistemic values such as 
the principle of total evidence, the precautionary principle, and external validity. 
Furthermore the notion of reliability developed with the EBM framework is method- 
specific; it has been constructed around a specific statistical school of thought: 
frequentist hypothesis testing. 

Opponents of the EBM approach play a different game: they are mainly 
concerned by context-sensitivity of causation, and, more generally, by defeasibility 
of scientific inferences; consequently they appeal to the Principle of Total Evi- 
dence, as an essential desideratum in non-monotonic reasoning, as opposed to the 
hypothetico-deductive paradigm; furthermore, they are also worried as to the extent 
to which such evidence may be used for extrapolation and prediction. Hence for 
them also relevance of the evidence, and external validity play an important role. 
In both games, there is a considerable amount of uncertainty, but dissent arises as 
to how such uncertainty should be managed and weighted,° and consequently, as to 
the costs and utilities of each research strategy, or more generally, of any regulatory 
standards.’ 


5They cite the example of the drug “Torcetrapib”, as a case where even perfect knowledge of its 
functioning mechanisms did not enable its producers to avoid excess death rates in the treatment 
arm of the third-phase trials, and therefore denial of approval. However, this failure was in fact due 
to lack of knowledge about (the mechanisms leading to) the side-effect, for which knowledge of 
the mechanisms for the intended effect is not necessarily helpful. 

6In the statistical counterparts of these games (frequentist vs. Bayesian statistics) probability as a 
measure of uncertainty is attached to the probability of error (in the long run) in one game, whereas 
in the other it is attached to the hypothesis itself. 

7See also Podolsky and Powers (2015) for a critique on a recent shift in FDA evidential standards, 
from what I call an “elitist” to a “pluralist” view. The considerations underpinning such critiques 
are cast exactly in terms of the costs and benefits of each regulatory approach. Analogously 
Osimani (2014) and Osimani and Mignini (2015) insist that evidential standards should not be 
the same for assessing efficacy and harm, exactly on these epistemic and strategic grounds. 
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15.3 The Elitist and the Pluralist Game 


The above games are associated with different epistemic goals and value specific 
epistemic virtues over others: it is this different preference ordering that might 
help us understand the dissent. Let’s look a little bit closer at these preference 
orderings and the related payoffs and losses. In the EBM framework reliability (in 
both senses) is highly valued: the inferential game is seen as a game against nature, 
chance and biased scientists: and truth is what remains after all other factors have 
been eliminated. The pluralist stance is more concerned about complexity of causal 
phenomena and about combining a plurality of evidence to find truth in the web of 
interactions. 

The general idea here is that whereas EBM puts efforts in decontextualising 
nature’s signals from noise and is more focused on science distorting them (through 
unreliable instruments), its opponents are rather worried that nature’s signals are 
embedded in a symphony and cannot be interpreted in isolation. Metaphorically 
speaking, the point is whether one considers context as noise to be eliminated, or as 
music which gives meaning to the individual note. 


15.3.1 The Elitist Game 


Clinching methods are elitist methods which value evidence for the degree to which 
they exclude random and systematic error. Random error refers to the “chance” 
variability around an otherwise reliable measurement. This can be amended by 
averaging the results of various replications of the study. Instead, systematic error 
refers to possible confounders, which systematically distort the results through 
replications. 

The different strategies advocated by so called Evidence Based Medicine have 
been developed in order to secure studies from two kinds of error®: 


1. random error; 
2. systematic error (this can be generated by bias, confounding or both). 


The strategies adopted to address these two kinds of error are orthogonal to each 
other, and are partly a consequence of the statistical approach entrenched in the 
EBM paradigm, that is, classical frequentist statistics. 


8The whole enterprise of the EBM paradigm can indeed be seen as the tireless effort to systematize 
a set of techniques to track and possibly minimize random and systematic error. 
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15.3.1.1 Random Error 


Random error is dealt with straightforwardly; indeed it is the basis of the entire 
statistical machinery of the frequentist approach: the level of tolerated random error 
is inscribed in the very methodological procedure by establishing it beforehand, and 
deciding on that basis, whether the hypothesis will be accepted or rejected, given 
the observed data. Hence, it is the desired “reliability level” (“significance level” and 
“power”) that determines the ultimate fate of the hypothesis under investigation. 

In this first sense, belief in the reliability of the result can be enhanced in 
two related ways: either by pooling results of various studies, thought to be 
homogeneous enough as to the sampled populations — so called meta-analyses — 
or by performing “identical” replications of the original study. 

As much as the confidence interval may be narrow or the number of consistent 
replications high, these indicators of accuracy could never set you free from another 
kind of error, namely systematic error; that is the error arising from the in principle 
“identical” studies measuring the effect of a confounder instead of the investigated 
variable. Pure statistical considerations cannot address the problem of systematic 
error, which is indeed rather approached by study design. 

It is as if we are dealing with three possible explanations when examining a 
given scientific result: (1) chance, (2) the investigated causal factor, (3) other causal 
factors (confounders, bias). Consistent replications reduce the probability of chance 
producing the results, but cannot do anything in discriminating between 2 and 3. 


15.3.1.2 Systematic Error 


The entire EBM paradigm can be seen as an effort to foster quality of evidence by 
both insisting on maximisation of accuracy and minimisation of systematic error (or 
maximisation of “internal validity”). This effort is concretised in the development 
of evidence hierarchies that rank studies according to their design. Hence, one 
finds randomised controlled trials at the top level of the ranking (and higher than 
that, systematic reviews and meta-analyses of RCTs); followed by comparative 
observational studies (such as cohort and retrospective — or, historical — studies), and 
below these, uncontrolled observational studies (that is, case series, and single case 
studies). Evidence concerning cellular mechanisms or, generally any data below the 
phenotypic level is ranked lowest.’ 

Internal validity refers to the neutralisation of possibly confounding factors, as 
candidate alternative explanations for the observed effect. These may be alternative 
or interactive causes (also known as prognostic and predictive factors). The 
preferred means to counteract this sort of problem is to use intervention and 
randomization, so as to avoid (self-)selection bias and have as much as possible 


°However the reason for ranking this data below the phenotypic level lowest, is due to issues of 
external rather than internal validity (see Howick (2011)). 
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balanced groups in the experiment arms. This aims to guarantee that the observed 
effect is due to the treatment, and only to it. However, also random error is taken 
into account; hence, larger RCTs are ranked higher in the hierarchy than smaller 
ones, and meta-analyses of several studies are ranked higher than individual studies. 

Below the standard design ranking of EBM hierarchies (with related epistemic 
goal) are: 


1. random vs. non-random sampling: this criterion is important both for the 
representativeness of the sampled population (of subjects or studies) and for the 
external validity of the results. A randomised controlled study, whose treatment 
and control arms are perfectly balanced, but whose subjects have not been 
sampled randomly from the target population will be internally valid, but not 
representative of the population under study (and will suffer from low external 
validity if the results need to be applied to that population). Indeed samples for 
randomised clinical trials are almost never sampled randomly from the original 
population. However, systematic reviews of meta-analyses aims to achieve this 
same goal, in constructing a population of studies which is as representative as 
possible of the sampled population. 

2. experimental vs. observational design (RCTs vs. cohort studies): experimental 
interventions are the best warrant of internal validity in that they severe the 
cause under investigation from other potential confounding factors, which could 
alternatively explain the observed difference in the two arms of the study; 

3. controlled vs. uncontrolled design (cohort — or any kind of comparative studies, 
vs series of clinical cases with no control group of “untreated” patient). Control 
serves the purpose of comparing whether the same causal effect is observed also 
in the absence of the investigated cause; that is, to verify relevance of the putative 
cause to the investigated effect. 


Hence, (1) control serves the goal of verifying (causal) relevance; (2) intervention 
through random allocation has the purpose of testing causal sufficiency; (3) and 
random sampling is a way to establish causal necessity: 


1. the statistical non-spurious difference between outcomes in the exposed and non- 
exposed groups provides evidence that the presence of the putative cause makes 
a difference with respect to the investigated outcome; 

2. random allocation ensures that no latent variable confounds the experimental 
results, and thus ensures that the set of causes taken into account is complete 
— although they are unknown. Causal sufficiency! is warranted by the very 
fact that the possible influence of all confounders (known and unknown) is 
neutralized through random allocation. 

3. the sample population for RCTs is very rarely sampled randomly from a source 
population. However, the ranking of systematic reviews of meta-analyses over 


10Tm the causal search literature Pearl (2000) and Spirtes et al. (2000), causal sufficiency is a 
fundamental assumption which grounds the algorithmic search, and undermines it if it fails to 
hold. 
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simple meta-analyses responds to the recognition that the latter may be biased by 
cherry-picking the included studies. Systematic reviews ensure that the sampled 
population of studies is non-biased and therefore representative of the source 
population. This warrants that the “causal law” inferred from the RCTs is at least 
valid in that specific population (the one from which the study population has 
been sampled from). 


15.3.2 The Pluralist Game 


Advocates of a pluralist methodology oppose the idea that RCTs have a privi- 
leged role in establishing causation (Worrall 2007a,b) as well as the view that 
difference-making is necessary or sufficient to establish causation (Cartwright 
2007a). Objections to this paradigm have been raised on the following grounds: 


1. internal/external validity trade-off: the causal structure underpinning the study 
results in the sample population, may not be equivalent to the one which 
characterises the causal dynamics in the target population, hence even studies 
that are strongly internally valid may fail to provide the right evidence for other 
kinds of populations. Indeed a sort of “inverse proportionality” relationship is 
posited between internal and external validity: the stricter the study protocol 
(inclusion/exclusion criteria, mode of administration, etc), the more likely that 
the study result will be internally valid, but also that its results will lead to wrong 
inferences in relation to real-life settings (Cartwright 2012)!'; 

2. the putative privilege of randomised evidence is ill-founded (Worrall 2007b,c): 
that is, randomised controlled trials do not even provide any guarantee of 
internally valid results; 


As a constructive response to these criticisms, a conciliatory position has been 
proposed, arguing for the adoption of a pluralistic approach to causal inference. 
According to this view, neither statistical evidence, nor evidence on mechanisms 
underpinning it are per se sufficient to establish causation, however they are both 
necessary. This position has been elaborated in Clarke et al. (2013, 2014), where 
it is advanced that different sorts of evidence may have complementary roles 
in supporting causal hypotheses. In particular, evidence about difference making 
helps de-masking causes which might be canceled out by back-up compensatory 
mechanisms in the organ system, whereas evidence about mechanisms is needed 
in order to design and interpret statistical studies — the upshot of this reasoning is 
that different kinds of evidence may systematically support each other and jointly 
(dis)confirm the causal claim under investigation. 


11 These criticisms have kindled a series of defenses of RCTs on various grounds: see Senn (2003), 
Papineau (1994), La et al. (2012), Teira (2011), Teira and Reiss (2013) and Osimani (2014) for an 
overview on the debate. 
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De Pretis et al. (2019) have extended this approach to various indicators of 
causality and relaxed the requirement that any of them be necessary, by allowing 
for probabilistic causal assessment, rather than categorical ones.!* 


15.3.3 Context-Sensitivity of Causality and Causal Modulation 


The insistence on mechanisms and external validity from proponents of the pluralist 
view on medical evidence straightforwardly derives from the intuition that causation 
in this scientific domain is highly context sensitive. However, what does context- 
sensitivity exactly mean? Roughly speaking, context sensitivity refers to the joint 
interaction of various causes in bringing about a given effect and in modulating 
its intensity. Hence, subjects in studies can be conceived as vectors consisting of 
possible combinations of variables (at different levels: molecular, cellular, organic), 
that jointly contribute to the occurrence and the modulation of the effect in the 
presence (vs. absence) of the investigated cause. 

Leonid Hanin (2017) makes this point very vividly when he explains the 
irreproducibility of trial results by drawing on various sources of (uncontrollable) 
variation in clinical research. Speaking about predictors of metastatic recurrence 
after breast cancer surgery, Hanin points out that: 


Whether or not a metastases will escape from dormancy in a particular patient depends 
not only on the effects of treatment, functioning of the immune system, concentrations of 
circulating angiogenesis promoters and inhibitors, and other internal factors; exhacerbation 
of the disease may also be triggered by intercurrent sporadic external events such as 
surgery unrelated to breast cancer, infection, trauma, radiation, stress, etc. Another highly 
significant prognostic factor is the intrinsic aggressiveness of the disease; however, its 
reliable assessment at early stages of the disease has proven to be far elusive. Thus the most 
critical determinants of the trial outcome are largely unobservable and/or unpredictable. 


To make things worse, an additional level of opacity should be added to the 
picture (my emphasis): 


In practice the above observable prognostic factors are substituted with less informative 
observable surrogates such as (1) age at trial entry; (2) stage and historical grade of of the 
disease at surgery; (3) localization and size of the primary tumor; (4) whether or not the 
tumor invaded surrounding tissues; (5) the extent of nodal involvement; (6) menopausal 
status; (7) estrogen and progesterone receptor status; (8) presence of specific mutations 
in BRCAI or BRCA2 genes; (9) family history of breast cancer; and (10) individual 
history of other malignancies. Even this rough and incomplete set of surrogate clinical 
variables creates a large number of categories of women in both arms of the trial with 
potentially very different characteristics of survival. Importantly, randomization won’t 


This stance can be made more general by drawing on the “Variety of Evidence Thesis” (VET) 
which, stated in its more general form, claims that ceteris paribus, the more varied the evidence, 
the higher the confirmatory support provided to the hypothesis which explains it. Taken at face 
value this claim seems to favour the pluralistic methodology approach over the “evidence elitist” 
view adopted in the EBM paradigm. See also Osimani and Landes (2020). 
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eliminate the observable and hidden heterogeneity; it will only reduce the difference in 
the extent of heterogeneity between treatment and control arms. The aforementioned inter- 
subject heterogeneity is quite typical of clinical trials (as opposed to in vitro experiments 
with cell lines or studies on animal models with tightly controlled inter-subject variation). 
Thus, individual responses of subjects in both arms of a trial cannot even approximately be 
viewed as homogeneous, let alone distributionally identical. 


(Hidden) mediators and moderators may also non-additively contribute to mod- 
ulating the causal effect through joint interaction. Human beings are not gas 
molecules; their reactions to the same treatment may be largely dependent on 
contingent factors (variation of individual response in different circumstances: intra- 
subject, or individual variation), and on various systematic mediating and interacting 
causes. Sample heterogeneity may be epistemically inscrutable since mediators 
and moderators may (and generally do) act jointly and at different organ levels 
(genetic/genomic, proteomic, cellular), and also emerge dynamically within one’s 
own clinical history and social environment. 

In “The Cement of the Universe” (1974) John Mackie offers a deterministic 
reading of so called probabilistic causation by advancing the concept of INUS 
condition, which provides a way to formalise the context-sensitive nature of 
causation. In his proposal, causes come in sets of components, (such as e.g. the 
presence of oxygen for a spark to start a fire), and an INUS condition is an 
insufficient but necessary component of an unnecessary but sufficient causal set. 
For instance, A is an INUS condition for E means that it is part of at least one 
conjunctive set standing in a biconditional relationship with EF: 


(ANB)V(AAC) SE 


The “causal sets” can be equated to possible worlds which explain the occurrence 
of E in the presence of various concurring causal factors. For instance where the 
following holds: 


(AABAC)V(AADAF)V(BADAF)SE 


the same effect E can be caused by different causal sets, such as for instance 
(ABC), or (ADF), or (BDF). Hence in a study which investigates the causal 
effect of A with respect to E (for instance a certain medical treatment with respect 
to a given clinical outcome) such effect will be the result of the proportion of 
people having also the characteristics B and C or D and F over all other subjects 
(since neither ACF nor ABF, nor ACD nor ABD are sufficient causal sets for E). 
Furthermore, there will be cases where, notwithstanding the absence of A, E will 
nevertheless occur (that is, when subjects present the characteristics BDF jointly). !* 


'3Since “causal strength” is generally measured by the “effect size”, that is, the proportion of 
subjects in the sample who show the effect E in the treatment vs. control group — for instance 
through relative risk ratio or odds ratio measures — causal strength can be both the result of a 
relative context-insensitivity of the treatment investigations (or better, the fact that it contributes 
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By drawing on this conceptualization of context-sensitivity, Cartwright (2012), 
underpins her criticisms of RCTs as gold standard in medicine, on account of the 
high context-sensitivity of causation in the soft sciences. Provided that the law 
underpinning the observed effect in relation to treatment X is formalized as follows: 


L:Y¥,=a+Bx-+Ww, (15.1) 
Where Y is the effect variable, X the cause variable, a is a constant, 6 a 


coefficient and W a random error summarising the effect of hidden latent variables; 
then, the average value of Y in the treatment group will be measured by: 


(Y (u)/X (u) = x’) =(a(u)/X(u) = x')+ (15.2) 
(B(u)/X (u) = x')x+ (15.3) 
(W(u)/X(u) =x’). (15.4) 


and, consequently, the treatment effect will be measured by the following 
equation: 


T =dep=(a(u)/X (u) = x’) — (a(u)/X (u) = x)+ (15.5) 
(B(u)/X (u) = x')x — (B(u)/X (u) = x)x+ (15.6) 
(W(u)/X (u) = x’) — (W(u)/X(u) = x). (15.7) 


Since randomization is meant to warrant that the expectation of a(u), B(u) and 
W (u) are the same whatever value X assumes (that is, that X is probabilistically 
independent from a, 6 and W), the first and last two terms in the equation cancel 
out; hence, the measure of the treatment effect given by an RCT results from the 
following formula: 


T =4ef (B(u))(x — x"). (15.8) 


The critical issue raised by Cartwright is that 6 represents all the combinations 
of factors that determine not only whether X contributes to Y (if 6 is 0, then 
this contribution is null); but also how (positively or negatively) and to what 
extent. By drawing on the notion of causes as INUS condition, Cartwright then 
considers the possibility that 6 is a disjunction of sets of interacting factors: 


p= Sis +++, Zin) +... + fm mis +++ Zmp). 


to the effect in many causal sets), or to its intrinsic force (as measured for instance by a steep 
dose-response curve). This intrinsic ambiguity prompted the substitution of the Bradford-Hill 
indicator “strength of the association” with three different indicators: probabilistic dependence, 
dose-response, and rate of growth in (De Pretis et al. 2019, Section 3). See Sect. 15.4 below. 
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This translates in an irreducible context-sensitivity component of causation, 
which cannot be fully accounted for by classical statistical methods of causal 
inference.If the main information provided by experimental evidence is about 
whether (and to what extent) a causal link between two phenomena exists, the 
information provided by detailed case series, or basic science (molecular studies; 
in vitro or, so called, “in silico” evidence) may contribute to a greater extent to the 
identification of specific subclasses in which the effect follows the treatment to a 
higher or lower degree, that is information about causal interactions.'* Analogous 
information may also come from adjustment (stratification) and subgroup analyses 
in controlled studies. However, the more context-sensitive the treatment under 
investigation, the larger the sample sizes need to be in order to obtain such 
information. What is needed here is the acknowledgement that the two approaches 
are focused on complementary (not opposite!) goals: the “elitist approach” is 
concerned about establishing causal links rather independently from the task of 
identifying mediators and moderators, and to minimize false positives resulting from 
random or systematic error. Reliability of the evidence consists in the isolation of 
the causal link under investigation from possible interferers, and is a function of the 
process through which it is collected. Instead, the “pluralist approach” emphasises 
the irreducible context-sensitive nature of causation in medicine (and in the social 
sciences); therefore reliability here is about whether the acquired causal knowledge 
is stable enough to be applied to other contexts (prediction, external validity). Any 
source of information is valued as a useful basis to identify such interacting co- 
factors. 

In the following I present a multilayer approach to causal inference in phar- 
macology, where both objectives are considered, and accounted for in a unifying 
framework. 


'4The importance to predict the variability of the effect as a function of the joint interaction of 
possible co-factors hence casts a new light on sources of evidence, such as case reports and 
case series — which standard hierarchies rank as low, because it is not “controlled” and hence 
cannot provide high warrant of internal validity — but whose specific epistemic import is no 
lower, indeed very high, in that they can provide us with valuable information about the various 
scenarios in which a given treatment may (or may not) induce its effect in different degrees. So, 
very detailed case reports facilitate inference about co-factors (prognostic factors and mediators) 
possibly influencing whether and to what extent, the effect size occurs in a specific population. 
“In silico” evidence comprehends a huge class of methodologies which can be broadly subdivided 
into systems biology approaches to computational modelling and simulation and machine learning 
techniques for knowledge extraction or pattern recognition. 
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15.4 E-Synthesis: A Probabilistic Causal Inference with 
Heterogeneous and Higher Order Evidence 


The so called “reproducibility crisis” (see for instance the “Reproducibility Project: 
Psychology” by the Open Science Collaboration) caused some stir among psy- 
chologists, methodologists as well as scientists. The crisis extends well beyond 
psychology and also invests medical research (Begley and Ellis 2012; Ioannidis 
2005; Prinz et al. 2011). 

While some analysts have provided formal confirmation for the plausibility 
of such explanations (Etz and Vandekerckhove 2016; Marsman et al. 2017), and 
some downplay the whole issue (Senn 2002), others have further insisted on 
the problem of noisy data and suggested that “to resolve the replication crisis 
in science we may need to consider each individual study in the context of an 
implicit meta-analysis” (Gelman 2015). By “meta-analysis” Andrew Gelman does 
not mean here the standard data-pooling methods developed for instance within the 
Cochrane Library initiative, where statistics from sufficiently homogeneous studies 
are averaged so as to obtain more accurate measures of effect sizes, but rather a 
global approach to evidence evaluation, which takes into account not only the prima 
facie supportive strength any individual study gives to the investigated hypothesis 
(and its alternatives), but also other “higher-order” dimensions related to the entire 
body of evidence: 


One direction for statistical analysis that appeals to me is Bayesian inference, an approach 
in which data are combined with prior information (in this case, the prior expectation that 
newly studied effects tend to be small, which leads us to downwardly adjust large estimated 
effects in light of the high probability that they could be coming largely from noise). (Gelman 
2015, p.35) (my emphasis). 


Therefore, higher order evidence may consist of consistent replications across 
repeated measurements, knowledge about the behaviour of measurement instru- 
ments under certain conditions, or more generally, robustness of results across 
methods and sources, and coherence of data with the established knowledge as well 
as with available theories. This evidence may interact with estimation about the 
reliability of the source, as a function of various social factors, such as financial 
interests, concern for liability and regulatory issues, scientific and commercial 
reputation. 

So far, there has been a division of labour among philosophers of science, 
epistemologists, and methodologists regarding these different layers. Philosophy of 
science and methodology (statistics) has focused on the prima facie relationship 
between evidence and hypothesis, and related problems (measurement error, evi- 
dential support, theory ladeness, underdetermination, internal and external validity) 
(Carnap 1956; Fisher 1955; Hacking 2006; Haack 2011; Lenhard 2006; Hoyningen- 
Huene 2013). More generally, classical and formal epistemology have paid attention 
to the formal warrants for knowledge justification (Audi 1993; BonJour 2009; 
Swinburne 2001). Social (formal) epistemology instead has concentrated its efforts 
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on the complex framework of interests and constraints which mold the scientific 
ecosystem (Goldman 1999; Longino 1990; Mayo-Wilson et al. 2011). 

With respect to medicine, Miriam Solomon (2015) has emphasised the social 
construction of medical knowledge, and David Teira has thoroughly examined the 
impact of the social environment (regulatory framework, economic interests, etc.) 
on the development of methods for evidence evaluation (see Teira, this volume, and 
Teira and Reiss (2013) and Teira (2011))!>; Bennett Holman frames the evolution of 
regulatory tools for drug approval and monitoring as an epistemically asymmetric 
race of arms (Holman 2015). 

In the present section I present a multilayer approach to modelling causal 
inference for pharmaceutical harm, that keeps track of the interactions between these 
different dimensions of evidence in a unitary framework (see Fig. 15.1): 


1. a basic level of evidential support to the hypothesis at hand (and various 
evidence aggregation/amalgamation techniques, see De Pretis et al. (2019)). 
This constitutes the traditional focus of philosophy of science with its various 
approaches to scientific inference (nomological-deductive, inductive, abductive, 
etc.) and of statistical inference; 

2. higher order level of epistemic dimensions related to the entire body of evidence: 
consistency/coherence of items of evidence, (in)dependence structure; reliability, 
relevance — this level also pertains to various meta-inferential patterns such 
as the “Non-Alternative-Argument” (Dawid et al. (2015)); this domain has 
been mainly investigated in Bayesian and formal epistemology. However, also 
standard statistical techniques developed to detect publication bias or other kinds 
of biases focus on these aspects: (Krauth et al. 2013; Rising et al. 2008; Wood 
et al. 2008; Lundh et al. 2017; Lundh and Bero 2017); 

3. a further level comprehending information/knowledge/evidence related to such 
meta-epistemic dimensions themselves. This information/knowledge/evidence 
relates to a social epistemology level and can be more or less straightforwardly 
inferred from it: incentives/deterrents for bias (such as financial interests, 
reputation etc.), social ontology of the research domain, and nudging studies. 


The philosophical foundations of such a framework have been presented in 
De Pretis et al. (2019). The framework consists in a Bayesian epistemic network, 
borrowed from Bovens and Hartmann (2003), which formalises scientific inference 
probabilistically. The ancestor consists in the investigated causal hypothesis, while 
its direct descendants are the epistemic consequences of such hypothesis, namely, 
indicators of causality resulting from an elaboration of Bradford-Hill guidelines 
for causation (Hill 1965). Concrete data is represented by reports of study results 
attached to the relevant causal indicator. A reliability and a relevance node input 
into each report node in order to weight the evidence by its level of reliability 
and stability with respect to the context of application (target population), see 


'STeira analyses for instance the role played by concerns about impartiality of research in the 
historical establishment of randomised controlled trials as gold standards for drug approval. 
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Fig. 15.1 A multilayer approach to modelling probabilistic causal inference through evidence 
synthesis: a. issues related to the evidential support provided by the evidence to the hypothesis at 
hand as discussed both in the statistical literature as well as in the philosophy of science; b. issues 
related to meta-evidential dimensions: consistency of studies, structure of the body of evidence 
in terms of mutual dependence of observations, reliability of the pieces of evidence and their 
relevance with respect to the target group; c. social dimensions of the pharmaceutical ecosystem 
(funding structures, reputational concerns, regulatory constraints), generally discussed in the social 
epistemology literature 


Fig. 15.2. By breaking down the evidential line between pieces of evidence and 
causality into a two-stage process mediated by causal indicators, E-Synthesis helps 
disentangle philosophical issues related to the conceptualisation of causality from 
those related to causal inference and diagnostics.!° At least some of the disputes 
among philosophers and methodologists may be solved by keeping these levels as 
distinct (see 15.5). 

Figure 15.3 illustrates the structure of the causal inference problem: the set 
of dependencies and independencies determines a hierarchical structure where 
causal indicators — difference making (A), probabilistic dependency (P D), dose- 
response relationship (DR), rate of growth (RoG), evidence of mechanism (M), 
information about time asymmetries (T), etc.!’ lie on the same level, whereas study 
reports, possibly deriving from different methods (observational, such as cohort or 


‘This is also in analogy with Bogen and Woodward’s distinction between data and phenomena 
(Bogen and Woodward 1988). 


17These are derived from the epidemiological/causal literature (in particular from Bradford Hill 
guidelines); see De Pretis et al. (2019). 
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retrospective studies) or experimental (such as Randomised Controlled Studies) lie 
below this level.!® 

To each report a reliability node is attached: this is intended to capture the degree 
of systematic error (confounding and bias) estimated to affect the source report, as 
well as a relevance node, referring to the representativeness of the study sample 
for the purpose of causal inference with respect to the target population. Random 
error is directly represented by the likelihood function mapping reports to abstract 
indicators. 

The graphical form provides an illustration of the epistemic dimensions at stake 
and thereby provides greater insight into some methodological issues by offering a 
mathematical explanation of their dynamics. This sort of representation allows one 
to single out in the mathematical formulae the specific role played by each epistemic 
dimension in the inferential dynamics; e.g., the role of reliability with respect to 
the propagation of confirmation in connection with replication of studies and with 
heterogenous sources of evidence. This framework has several advantages!?: 


Hypothesis 


Causal Indication 


Reliability 


Evidence Reports 


Relevance 


Fig. 15.2 Graph structure of the Bayesian network for two reports and epistemic categories 


'8The frameworks also allows modeling and simulation studies to play a relevant confirmatory 
role, especially with regard to the underlying dynamics underpinning the phenotypic causal effect. 
See also Osimani and Poellinger (2020). 


'9Please refer to De Pretis et al. (2019) for basics and details of the framework. 


barbaraosimani@gmail.com 


363 


15 Epistemic Gains and Epistemic Games: Reliability and Higher Order. . . 


II Polapisuoos jou Ajyesned Jo 
SIOJCIIPUL JOYJANJ 9q IYSIU AY JY) AVOIPUL SIOP M,L 'AQLHLA JOyeoIpUT [esnvo AJOAI 107 WOdaI JUAIO]JIp IUO YIM YAOMJoU ULIsakeg oy) Jo ydeIn g's “SI 


C=) Ca) Ce) (mn) (om) (om) (vor) 
fm) fam) Cr) (rm) Ge) Ge) Gm) 
ONOHO l E EA C 


barbaraosimani@gmail.com 


364 B. Osimani 


1. it identifies possible indicators of causality on the basis of the methodological 
and philosophical literature on causality, evidence, and causal inference; 

2. embeds them in a topological framework of probabilistic dependencies and 
independencies grounded in assumptions regarding their reciprocal epistemic 
interconnections; 

3. weakly orders some of these probabilistic dependencies as a function of their 
inferential strength with respect to the confirmation of causal hypotheses. 

4. it easily accommodates many intuitions already expressed by philosophers of 
medicine regarding pluralistic approaches to evidence evaluation; 

5. it lends itself to explicitly track the interaction of several dimensions of evidence, 
such as coherence and reliability; 

6. it allows for a pluralistic but at the same time systematic approach to evidence 
amalgamation; 

7. it tracks the different role of cross-validation through heterogeneous methods or 
sources vs. the confirmatory contribution of exact replication (see also Osimani 
and Landes (2020)); 


I present here how this framework can accommodate intuitions coming from both 
sides of the dispute, by considering five specific issues: the EBM vs. pluralist 
approach to causal inference, evidence hierarchies, causal holism, relevance (exter- 
nal validity), and reliability. These issues also show how our framework provides 
a higher order perspective on these debates by effectively embedding these various 
epistemic dimensions in a concrete topology. 


15.5 Discussion 


By breaking down the evidential line between pieces of evidence and causality 
into a two-stage process mediated by causal indicators, E-Synthesis helps dis- 
entangle philosophical issues related to the conceptualisation of causality from 
those related to causal inference and diagnostics. Furthermore, the framework aims 
to probabilistic causal assessment, and therefore bypasses problems generated by 
accounts that aim to establish causal claims categorically (through the identification 
of necessary and sufficient conditions for causation). More importantly, separate 
nodes for relevance and reliability are embedded in the network; this contributes 
to deflate much discussion in the methodological literature concerning the trade-off 
between these two dimensions of evidence quality. 


15.5.1 The EBM vs. Pluralist Approach to Causal Inference 


Translated within the E-Synthesis framework, the implicit assumptions underpin- 
ning the EBM viewpoint is that ideal RCTs (that is internally valid ones) are 
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perfect indicators of difference making, and difference making is a perfect indicator 
of causality, whereas other indicators only weakly support the hypothesis of 
causation. This can be represented in logical terms as an entailment relationship: 


RCT DADO. 


As a consequence, EBM focuses on the level of evidence that goes through the A 
indicator, and concentrates its efforts on having as reliable as possible evidence for 
such indicator. 

The contending view is that different indicators may have complementary 
epistemic roles in supporting the hypothesis of causality. However, to this view, 
held for instance by Clarke et al. (2014), Howick counters that: “There are many 
cases where patient-relevant effects of medical therapies have been established by 
comparative clinical studies alone.” (Howick 2011, p. 939). 

The debate is jeopardised by conflating the two entailment relationships RCT 
> A, and A D Ọ into one; that is, what is discussed is whether it can be justifiably 
held that RCTs provide perfect information for causality (directly): i.e. whether 


RCT D ©, 


rather than either whether RCT D A, or A D ©. 
If we let the epistemic net of E-Synthesis represent this discussion, we can take 
Howick to hold the view that: 


P@|A)=1, 
and that ideal RCTs provide strong evidence for A: 
P(A|RCT) 1, 
hence offering strong enough evidence to establish the causal claim; while for the 
contenders P(© | A) is too small to establish the causal claim. For them, also having 
mechanistic evidence is required to establish the causal claim: 
P(©| A&M) = 1. 
E-Synthesis allows for RCT > A D © to hold, but the entailment relationship 
between RCTs and A is one between ideal RCTs and A. Hence in any concrete 


case P(A|RCT) < 1, and this leaves room for other kinds of evidence to also 
contribute to hypothesis confirmation. On the other side, by dropping any necessity 


20This directly derives from the potential outcome approach underpinning RCT methodology. See 
Holland et al. (1985), Rubin (2005), and Vandenbroucke et al. (2016) for a critical appraisal of this 
approach. 
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or sufficiency requirements for causal inference, this approach relaxes the theory 
that both evidence of association and of underlying biological mechanisms is 
necessary to establish causation. This approach is therefore much more flexible 
and responds both to the EBM intuition underpinning the privileged role assigned 
to RCTs, as well as to the pluralist intuition that various kinds of evidence are 
contributory to causal assessment. 


15.5.2 Evidence Hierarchies 


In relation to evidence hierarchies — which is a strong point of contention among 
philosophers of medicine and methodologists — E-Synthesis poses a set of inequal- 
ities regarding the evidential strength of various causal indicators (see De Pretis 
et al. (2019, p.32—33)), which nicely parallels the rankings proposed in the EBM 
paradigm: 


P(©|A) > P(©|DR), P(©|RoG) > P(©|PD) (15.9) 


What differentiates E-Synthesis from standard evidence rankings however is 
that these have predominantly been formalised as lexicographic decision rules. 
This means that higher-level studies trump lower-level ones: when two studies 
of different levels deliver contradictory findings, then the one higher in the 
evidence hierarchy is considered more reliable and allows one to discard the 
lower level one.”! E-Synthesis incapsulates the rationale for ranking evidence (in 
the inequalities across probabilistic dependencies between causation and various 
indicators), but at the same time allows one to take into account all evidence, 
and to act accordingly, as soon as the probability of the causal hypothesis goes 
above the threshold established by the other dimensions of the decision (utility of 
withdrawing/not withdrawing the drug, conditional on the probability of it causing 
the suspected harm) (see De Pretis et al. 2019, Section 22) 


15.5.3 Causal Holism 


Methodological pluralists such as Cartwright (2011; 2007b), and Stegenga (2011), 
among others, express concerns against the privileged role of RCTs also on grounds 


21A somewhat unwanted consequence of this “take the best” approach is that it has become 
commonplace to assume an uncommitted attitude towards observed associations least they are 
“proved” by gold standard evidence (see the still ongoing debate on the possible causal association 
between paracetamol and asthma: Osimani (2014)). 

This also complies with the precautionary principle in risk assessment and with how decisions 
should be made in health settings: Osimani (2007, 2013) and Osimani et al. (2011). 
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that classical ‘linear’ approaches to causal inference cannot do justice to the 
complexity of causal phenomena in the biological and social sciences, characterized 
by nonlinear causation and causal interactions.” 

Strictly speaking this sort of criticism does not deny that: 


AD ©, 
but only denies the reverse: 
©DA. 


Since in the causal graph literature the defining features for causality jointly 
entail that A <> © but not the reverse, this criticism misses the point. We contribute 
to deflate the debate by not collapsing A and © into a single node, therefore allowing 
causation to be holistic and therefore not reducible to difference-making, and at the 
same time letting difference-making immediately imply causation: in E-Synthesis 
when a difference-making relationship between two events or variables holds, then 
this is a sufficient — although not necessary — condition for causality. This can be 
characterized in logical terms as an entailment relationship: A > ©. Hence, in 
E-Synthesis, the probability of a causal relationship, given a genuine difference 
making relationship is 1: P(©| A) = 1. The inverse entailment though, © D A, 
does not hold: knowledge of © does not necessitate the existence difference-making 


— e.g., in cases of “holistic causation’”.** 


15.5.4 Evidence of Mechanisms and Relevance 


Pluralists accord to evidence of mechanisms a preeminent role in establishing 
external validity and extrapolation, in that it helps evaluate whether the cause 
under investigation will work in a similar “context” also in the target population 
(Russo and Williamson 2007; Cartwright 2007a; Cartwright and Stegenga 2011). 
The present framework however, formally distinguishes the role of evidence of 
mechanisms for the purpose of causal assessment, from its role for the purpose of 
establishing external validity by associating the latter to the relevance node RLV. 
This allows us to explicitly distinguish the different kinds of inductive risk involved 
in the inference: (1) from statistical dirty data to causal indicator(s), and then to 
causality, in a specific population, (2) the extrapolation of a causal link established 


23Tn the same line, also modular conceptualization of causes such as the ones implied in the causal 
graph methodology developed by Pearl (2000) and Glymour Spirtes et al. (2000) and colleagues 
(see also Woodward (2003)), are under attack for failing to recognize that causes may be holistic 
and therefore may be not adequately captured by a difference making account. 

4This responds to concerns expressed among others by Cartwright (2007c), Mumford and Anjum 
(2011), Anjum (2012), and Kerry et al. (2012). 
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in a given population/model to another population/model. More importantly, adding 
a relevance node to the evidence reports allow for even highly reliable RCTs to 
play a low evidential support if they are not considered to be relevant to the target 
population. 


15.5.5 Reliability and Higher Order Evidence 


By introducing a reliability node Rel, and thereby breaking up the different 
dimensions of evidence (strength, relevance, reliability) E-Synthesis allows them 
to be explicitly tracked in the body of evidence. This makes it possible to parcel out 
the strength of evidence from the method with which it was obtained.”> With this, 
E-synthesis provides a higher order perspective on evidential support by effectively 
embedding these various epistemic dimensions in a concrete topology. Indeed, the 
framework presented here also provides a fruitful platform for integrating insights 
developed in the philosophy of science around such topics as the role of replication 
in assessing the reliability of evidence (Open Science Collaboration 2015; Meehl 
1990; Lamal 1990; Hempel 1968; Platt 1964), as well as the confirmatory role of 
explanatory power (McGrew 2003; Crupi et al. 2013; Cohen 2016; Lipton 2003) and 
coherence (Dietrich and Moretti 2005; Moretti 2007; Wheeler and Scheines 2013; 
Fitelson 2003; Bovens and Hartmann 2003). This paper focuses on inference within 
one model, rooting in one hypothesis, but E-Synthesis allows for going beyond 
the network’s limits and for embedding it in an even larger network to trace the 
hypothesis’ relation with other potentially concurring hypotheses. The mechanics 
of Bayesian epistemology are flexible enough to permit such an augmentation for 
the purposes of tracing further inference patterns. The framework is currently being 
developed into a concrete tool for evidence amalgamation (see Landes et al. (2018)) 
and possibly into a software. 
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25Osimani and Landes (2020) investigates the various concepts of reliability involved in such 
considerations. 
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